Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features

نویسندگان

  • Howard Lei
  • Jaeyoung Choi
  • Gerald Friedland
چکیده

We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumerproduced videos “from-the-wild.” Eighteen cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8 percent. This result is well above-chance, even though the videos contain very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

City-Identification on Flickr Videos Using Acoustic Features

This article presents an approach that utilizes audio to discriminate the city of origin of consumerproduced videos – a task that is hard to imagine even for humans. Using a sub-set of the MediaEval Placing Task's Flickr video set, we conducted an experiment with a setup similar to a typical NIST speaker recognition evaluation run. Our assumption is that the audio within the same city might be ...

متن کامل

Visual Concept Features and Textual Expansion in a Multimodal System for Concept Annotation and Retrieval with Flickr Photos at ImageCLEF2012

This paper presents our submitted experiments in the Concept annotation and Concept Retrieval tasks using Flickr photos at ImageCLEF 2012. This edition we applied new strategies for both the textual and the visual subsystems included in our multimodal retrieval system. The visual subsystem has focus on extending the low-level features vector with concept features. These concept features have be...

متن کامل

Geotagging Flickr Photos And Videos Using Language Models

This paper presents an experimental framework for the Placing tasks, both estimation and verification at MediaEval Benchmarking 2016. The proposed framework provides results for four runs first, using metadata (such as user tags and title of images and videos), second, using visual features extracted from the images (such as tamura), third, by using the textual and visual features together and ...

متن کامل

How Spatial Segmentation improves the Multimodal Geo-Tagging

In this paper we present a hierarchical, multi-modal approach in combination with different granularity levels for the Placing Task at the MediaEval benchmark 2012. Our approach makes use of external resources like gazetteers to extract toponyms in the metadata and of visual and textual features to identify similar content. First, the bounderies detection recognizes the country and its dimensio...

متن کامل

Automatic Genre and Show Identification of Broadcast Media

Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives. Effective techniques are needed to make such data accessible further. Automatic meta-data labelling of broadcast media is an essential task for multimedia indexing, where it is standard to use multi-modal input for such purposes. This paper describes a novel method for automatic detection...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012